AITopics | gtd algorithm

Finite Sample Analysis of the GTD Policy Evaluation Algorithms in Markov Setting

Neural Information Processing SystemsMar-17-2026, 13:31:25 GMT

In reinforcement learning (RL), one of the key components is policy evaluation, which aims to estimate the value function (i.e., expected long-term accumulated reward) of a policy. With a good policy evaluation method, the RL algorithms will estimate the value function more accurately and find a better policy. When the state space is large or continuous \emph{Gradient-based Temporal Difference(GTD)} policy evaluation algorithms with linear function approximation are widely used. Considering that the collection of the evaluation data is both time and reward consuming, a clear understanding of the finite sample performance of the policy evaluation algorithms is very important to reinforcement learning. Under the assumption that data are i.i.d.

artificial intelligence, machine learning, reinforcement learning, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Finite Sample Analysis of the GTD Policy Evaluation Algorithms in Markov Setting

Neural Information Processing SystemsNov-21-2025, 14:51:38 GMT

In reinforcement learning (RL), one of the key components is policy evaluation, which aims to estimate the value function (i.e., expected long-term accumulated reward) of a policy. With a good policy evaluation method, the RL algorithms will estimate the value function more accurately and find a better policy. When the state space is large or continuous \emph{Gradient-based Temporal Difference(GTD)} policy evaluation algorithms with linear function approximation are widely used. Considering that the collection of the evaluation data is both time and reward consuming, a clear understanding of the finite sample performance of the policy evaluation algorithms is very important to reinforcement learning. Under the assumption that data are i.i.d.

finite sample analysis, gtd policy evaluation algorithm, markov, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Finite Sample Analysis of the GTD Policy Evaluation Algorithms in Markov Setting

Yue Wang, Wei Chen, Yuting Liu, Zhi-Ming Ma, Tie-Yan Liu

Neural Information Processing SystemsNov-21-2025, 07:05:46 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.14)
Asia > China > Beijing > Beijing (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.30)

Add feedback

Provably Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost

Zhuoran Yang, Yongxin Chen, Mingyi Hong, Zhaoran Wang

Neural Information Processing SystemsOct-3-2025, 06:39:21 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, arxiv preprint arxiv, optimization, (11 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Minnesota (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Finite Sample Analysis of the GTD Policy Evaluation Algorithms in Markov Setting

Yue Wang, Wei Chen, Yuting Liu, Zhi-Ming Ma, Tie-Yan Liu

Neural Information Processing SystemsOct-3-2024, 03:42:45 GMT

In reinforcement learning (RL), one of the key components is policy evaluation, which aims to estimate the value function (i.e., expected long-term accumulated reward) of a policy. With a good policy evaluation method, the RL algorithms will estimate the value function more accurately and find a better policy. When the state space is large or continuous Gradient-based Temporal Difference(GTD) policy evaluation algorithms with linear function approximation are widely used. Considering that the collection of the evaluation data is both time and reward consuming, a clear understanding of the finite sample performance of the policy evaluation algorithms is very important to reinforcement learning. Under the assumption that data are i.i.d.

algorithm, finite sample, gtd algorithm, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.14)
Asia > China > Beijing > Beijing (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.30)

Add feedback

A new Gradient TD Algorithm with only One Step-size: Convergence Rate Analysis using $L$-$\lambda$ Smoothness

Yao, Hengshuai

arXiv.org Artificial IntelligenceSep-2-2023

Gradient Temporal Difference (GTD) algorithms (Sutton et al., 2008, 2009) are the first $O(d)$ ($d$ is the number features) algorithms that have convergence guarantees for off-policy learning with linear function approximation. Liu et al. (2015) and Dalal et. al. (2018) proved the convergence rates of GTD, GTD2 and TDC are $O(t^{-\alpha/2})$ for some $\alpha \in (0,1)$. This bound is tight (Dalal et al., 2020), and slower than $O(1/\sqrt{t})$. GTD algorithms also have two step-size parameters, which are difficult to tune. In literature, there is a "single-time-scale" formulation of GTD. However, this formulation still has two step-size parameters. This paper presents a truly single-time-scale GTD algorithm for minimizing the Norm of Expected td Update (NEU) objective, and it has only one step-size parameter. We prove that the new algorithm, called Impression GTD, converges at least as fast as $O(1/t)$. Furthermore, based on a generalization of the expected smoothness (Gower et al. 2019), called $L$-$\lambda$ smoothness, we are able to prove that the new GTD converges even faster, in fact, with a linear rate. Our rate actually also improves Gower et al.'s result with a tighter bound under a weaker assumption. Besides Impression GTD, we also prove the rates of three other GTD algorithms, one by Yao and Liu (2008), another called A-transpose-TD (Sutton et al., 2008), and a counterpart of A-transpose-TD. The convergence rates of all the four GTD algorithms are proved in a single generic GTD framework to which $L$-$\lambda$ smoothness applies. Empirical results on Random walks, Boyan chain, and Baird counterexample show that Impression GTD converges much faster than existing GTD algorithms for both on-policy and off-policy learning problems, with well-performing step-sizes in a big range.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2307.15892

Country:

North America > Canada > Alberta (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Games (0.45)
Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Finite-Sample Analysis of Proximal Gradient TD Algorithms

Liu, Bo, Liu, Ji, Ghavamzadeh, Mohammad, Mahadevan, Sridhar, Petrik, Marek

arXiv.org Machine LearningJul-3-2020

In this paper, we analyze the convergence rate of the gradient temporal difference learning (GTD) family of algorithms. Previous analyses of this class of algorithms use ODE techniques to prove asymptotic convergence, and to the best of our knowledge, no finite-sample analysis has been done. Moreover, there has been not much work on finite-sample analysis for convergent off-policy reinforcement learning algorithms. In this paper, we formulate GTD methods as stochastic gradient algorithms w.r.t.~a primal-dual saddle-point objective function, and then conduct a saddle-point error analysis to obtain finite-sample bounds on their performance. Two revised algorithms are also proposed, namely projected GTD2 and GTD2-MP, which offer improved convergence guarantees and acceleration, respectively. The results of our theoretical analysis show that the GTD family of algorithms are indeed comparable to the existing LSTD methods in off-policy learning scenarios.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

2006.14364

Country:

North America > Canada > Alberta (0.14)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)

Add feedback

Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

Liu, Bo, Gemp, Ian, Ghavamzadeh, Mohammad, Liu, Ji, Mahadevan, Sridhar, Petrik, Marek

arXiv.org Machine LearningJun-6-2020

In this paper, we introduce proximal gradient temporal difference learning, which provides a principled way of designing and analyzing true stochastic gradient temporal difference learning algorithms. We show how gradient TD (GTD) reinforcement learning methods can be formally derived, not by starting from their original objective functions, as previously attempted, but rather from a primal-dual saddle-point objective function. We also conduct a saddle-point error analysis to obtain finite-sample bounds on their performance. Previous analyses of this class of algorithms use stochastic approximation techniques to prove asymptotic convergence, and do not provide any finite-sample analysis. We also propose an accelerated algorithm, called GTD2-MP, that uses proximal ``mirror maps'' to yield an improved convergence rate. The results of our theoretical analysis imply that the GTD family of algorithms are comparable and may indeed be preferred over existing least squares TD methods for off-policy learning, due to their linear complexity. We provide experimental results showing the improved performance of our accelerated gradient TD methods.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

2006.03976

Country:

North America > Canada > Alberta (0.14)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(6 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Energy > Energy Storage (1.00)
Electrical Industrial Apparatus (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Finite Sample Analysis of the GTD Policy Evaluation Algorithms in Markov Setting

Wang, Yue, Chen, Wei, Liu, Yuting, Ma, Zhi-Ming, Liu, Tie-Yan

Neural Information Processing SystemsFeb-14-2020, 17:28:08 GMT

In reinforcement learning (RL), one of the key components is policy evaluation, which aims to estimate the value function (i.e., expected long-term accumulated reward) of a policy. With a good policy evaluation method, the RL algorithms will estimate the value function more accurately and find a better policy. When the state space is large or continuous \emph{Gradient-based Temporal Difference(GTD)} policy evaluation algorithms with linear function approximation are widely used. Considering that the collection of the evaluation data is both time and reward consuming, a clear understanding of the finite sample performance of the policy evaluation algorithms is very important to reinforcement learning. Under the assumption that data are i.i.d.

artificial intelligence, machine learning, reinforcement learning, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

On the Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost

Yang, Zhuoran, Chen, Yongxin, Hong, Mingyi, Wang, Zhaoran

arXiv.org Machine LearningJul-14-2019

Compared with the classical policy gradient algorithm 1992), actor-critic tracks the action-value function (critic) in policy gradient in an online(Williams, manner, and alternatively updates the policy (actor) and the critic. On the one hand, the online update of critic significantly reduces the variance of policy gradient and hence leads to faster convergence. On the other hand, it also introduces algorithmic instability, which is often observed in practice (Islam et al., 2017) and parallels the notoriously unstable training of generative adversarial and Vinyals, 2016). Such instability of actor-critic originates from severalnetworks (Pfau intertwining challenges, including(i) function approximation of actor and critic, (ii) improper choice of stepsizes, (iii) the noise arising from stochastic approximation, (iv) the asynchrony between actor and critic, and (v) possibly off-policy data used in the update of critic. As a result, the convergence of actor-critic remains much less well understood than that of policy gradient, which itself is open. Consequently, the practical use of actor-critic often lacks theoretical guidance. In this paper, we aim to theoretically understand the algorithmic instability of actor-critic.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1907.06246

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Minnesota (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report (0.50)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

Collaborating Authors

gtd algorithm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Finite Sample Analysis of the GTD Policy Evaluation Algorithms in Markov Setting

Finite Sample Analysis of the GTD Policy Evaluation Algorithms in Markov Setting

Finite Sample Analysis of the GTD Policy Evaluation Algorithms in Markov Setting

Provably Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost

Finite Sample Analysis of the GTD Policy Evaluation Algorithms in Markov Setting

A new Gradient TD Algorithm with only One Step-size: Convergence Rate Analysis using $L$-$\lambda$ Smoothness

Finite-Sample Analysis of Proximal Gradient TD Algorithms

Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

Finite Sample Analysis of the GTD Policy Evaluation Algorithms in Markov Setting

On the Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost